Search CORE

33 research outputs found

Extracting Multilingual Natural-Language Patterns for RDF Predicates

Author: A.-C. Ngonga Ngomo
S. Auer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract. Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for ex-tracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi inde-pendent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as back-ground knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy. Moreover, we provide the first multilingual repository of natural-language representations of predicates found on the Data Web.

CiteSeerX

Crossref

Distributed Holistic Clustering on Linked Data

Author: A Saeedi
A-C Ngonga Ngomo
E Rahm
I Megdiche
K Hildebrandt
M Nentwig
M Nentwig
Publication venue
Publication date: 30/08/2017
Field of study

Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains

arXiv.org e-Print Archive

Crossref

DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data

Author: A.-C. Ngonga Ngomo
B. Bishop
C. Bizer
E. Minack
F. Belleau
J. Broekstra
J. Lehmann
S. Auer
Z. Pan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Triple stores are the backbone of increasingly many Data Web appli-cations. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in gen-eral. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been con-verted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applica-tions against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more use-ful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the per-formance of triple stores is by far less homogeneous than suggested by previous benchmarks. 1

CiteSeerX

Crossref

Seasonal changes in social networks of giraffes

Author: Bennett Nigel C., 1961-
Burroughs R.E.J. (Richard)
Ganswindt Andre
Ngonga Ngomo A.-C.
Wolf Tanja Esther
Publication venue: 'Wiley'
Publication date: 01/06/2018
Field of study

Fission‐fusion social societies allow animals to respond in a flexible manner to environmental changes by adapting the size and composition of a group. Although group members change frequently in these systems, associations with preferred partners may be found. In this study, we examined the grouping patterns of a population of 80 individual giraffes in a fenced South African game reserve over a 12‐month period. Using social network analyses as a tool to evaluate observed associations, we subsequently analysed both sex‐ and season‐related grouping patterns within the study population. Mixed sex groups represented 49% of all groups observed, and although overall group composition distribution did not differ significantly between seasons, the number of encountered single females decreased by 50%, whereas the number of multi‐male groups increased by over 50% in winter. Overall average group size did not differ significantly between seasons, but significantly larger multi‐female and multi‐male groups were seen in winter. Within the social network, two distinct clusters were found in summer, with the population more divided in winter, with five distinct clusters emerging. The strongest ties (highest HWIGs) were found between adult females. Our study revealed that giraffes not only live in a highly flexible social fission‐fusion system, but also show seasonal patterns of grouping.The National Research Foundation, South Africa and a postgraduate bursary from the DST-NRF SARChI chair for Mammal behavioural ecology and physiology to (NCB).http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1469-79982019-06-01hj2018Anatomy and PhysiologyCentre for Veterinary Wildlife StudiesZoology and Entomolog

UPSpace at the University of Pretoria

Discovering linkage points over web data

Author: Arasu A.
Bizer C.
Burdick D.
Christen P.
Dhamanka R.
Duan S.
Euzenat J.
Hassanzadeh O.
Hassanzadeh O.
Hernández M. A.
Hutter F.
Isele R.
Kang J.
Lenzerini M.
Naumann F.
Ngonga Ngomo A.-C.
Rahm E.
Robertson S.
Salton G.
Warren R. H.
Zhang M.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

HELIOS – Execution Optimization for Link Discovery

Author: A. Nikolov
A.-C. Ngonga Ngomo
A.-C. Ngonga Ngomo
A.-C. Ngonga Ngomo
D. Song
P. Griffiths Selinger
P. Shvaiko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Using Caching for Local Link Discovery on Large Data Sets

Author: A-C Ngonga Ngomo
A-C Ngonga Ngomo
A-C Ngonga Ngomo
A-CN Ngomo
EJ O’Neil
K Lyko
M Arlitt
R Karedla
S Podlipnig
W-C Hou
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

When to Reach for the Cloud: Using Parallel Hardware for Link Discovery

Author: A. Nikolov
A.-C. Ngonga Ngomo
A.-C. Ngonga Ngomo
A.C. Ngonga Ngomo
A.K. Elmagarmid
C. Xiao
H. Köpcke
J. Dean
L. Kolb
L. Kolb
N. Heino
R. Isele
S. Auer
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. With the ever-growing amount of RDF data available across the Web, the discovery of links between datasets and deduplication of resources within knowledge bases have become tasks of crucial importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, little attention has been paid to the management of hardware resources for the execution of link discovery tasks. This paper addresses this research gap by investigating the efficient use of hardware resources for link discovery. We implement the HR 3 approach for three different parallel processing paradigms including the use of GPUs and MapReduce platforms. We also perform a thorough performance comparison for these implementations. Our results show that certain tasks that appear to require cloud computing techniques can actually be accomplished using standard parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery

CiteSeerX

Crossref

WOMBAT - a generalization approach for automatic link discovery

Author: Lehmann J.
Ngonga Ngomo A.-C.
Sherif M.A.
Publication venue
Publication date: 01/01/2017
Field of study

A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating Wombat, a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of Wombat and evaluate it on 8 different benchmark datasets. Our evaluation suggests that Wombat outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that Wombat’s pruning algorithm allows it to scale well even on large datasets

Fraunhofer-ePrints

ROCKER

Author: Atencia M.
Auer S.
Michelson M.
Ngonga Ngomo A.-C.
Ngonga Ngomo A.-C.
Scharffe F.
Soru T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref